SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 



o 
o 

(N 



Capacity Region of the Finite-State Multiple 
Access Channel with and without Feedback 

Haim Permuter and Tsachy Weissman 



Abstract 

The capacity region of the Finite-State Multiple Access Channel (FS-MAC) with feedback that may be an 



arbitrary time-invariant function of the channel output samples is considered. We characterize both an inner and an 
, outer bound for this region, using Masseys's directed information. These bounds are shown to coincide, and hence 

yield the capacity region, of FS-MACs where the state process is stationary and ergodic and not affected by the 
inputs. Though 'multi-letter' in general, our results yield explicit conclusions when applied to specific scenarios of 
interest. E.g., our results allow us to: 

• Identify a large class of FS-MACs, that includes the additive mod-2 noise MAC where the noise may have 
memory, for which feedback does not enlarge the capacity region. 

• Deduce that, for a general FS-MAC with states that are not affected by the input, if the capacity (region) without 
feedback is zero, then so is the capacity (region) with feedback. 

• Deduce that the capacity region of a MAC that can be decomposed into a 'multiplexer' concatenated by a point- 
, to-point channel (with, without, or with partial feedback), the capacity region is given by Yj m ^ m — where 

C is the capacity of the point to point channel and m indexes the encoders. Moreover, we show that for this 
family of channels source-channel coding separation holds. 

o " 

i> 

Index Terms 

Feedback capacity, multiple access channel, capacity region, directed information, causal conditioning, code-tree. 



X 

5^ ' source-channel coding separation, sup-additivity of sets. 

I. Introduction 

The Multiple Access Channel (MAC) has received much attention in the literature. To put our contributions 
in context, we begin by briefly describing some of the key results in the area. The capacity region for the 
memory less MAC was derived by Ahlswede in [1]. Cover and Leung derived an achievable region for a memory less 
MAC with feedback in [2], Using block Markov encoding, superposition and list codes, they showed that the 
region R x < I{Xy,Y\X 2 ,U), R 2 < I{X 2 ;Y\X 1 ,U) and R x + R 2 < I(X 1 ,X 2 ;Y) where P(u,x u x 2 ,y) = 
p{u)p{x\\u)p{x 2 \u)p(y\x\, x%) is achievable for a memoryless MAC with feedback. Willems showed in [3] that 
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the achievable region given by Cover and Leung for a memoryless channel with feedback is optimal for a class of 
channels where one of the inputs is a deterministic function of the output and the other input. More recently Bross 
and Lapidoth [4] improved Cover and Leung's region, and Wu et. al. [5] have extended Cover and Leung's region 
for the case that non-causal state information is available at both encoders. 

Ozarow derived the capacity of a memoryless Gaussian MAC with feedback in [6], and showed it to be achievable 
via a modification of the Schalkwijk-Kailath scheme [7]. In general, the capacity in the presence of noisy feedback 
is an open question for the point-to-point channel and a fortiori for the MAC. Lapidoth and Wigger [8] presented an 
achievable region for the case of the Gaussian MAC with noisy feedback and showed that it converges to Ozarow's 
noiseless-feedback sum-rate capacity as the feedback-noise variance tends to zero. Other recent variations on the 
Schalkwijk-Kailath scheme of relevance to the themes of our work include the case of quantization noise in the 
feedback link [9] and the case of interference known non-causally at the transmitter [10]. 

Verdu characterized the capacity region of a Multi- Access channel of the form P{y i \x\,x l 2l y l ~ 1 ) = 
P{Vi\ x \ i- m -> x \ i-m) without feedback in [11]. Verdu further showed in that work that in the absence of frame 
synchronism between the two users, i.e., there is a random shift between the users, only stationary input distributions 
need be considered. Cheng and Verdu built on the capacity result from [11] in [12] to show that for a Gaussian 
MAC there exists a water-filling solution that generalizes the point-to-point Gaussian channel. 

In [13] [14], Kramer derived several capacity results for discrete memoryless networks with feedback. By using 
the idea of code-trees instead of code-words, Kramer derived a 'mulit-letter' expression for the capacity of the 
discrete memoryless MAC. One of the main results we develop in the present paper extends Kramer's capacity 
result to the case of a stationary and ergodic Markov Finite-State MAC (FS-MAC), to be formally defined below. 

In [15] [16], Han used the information-spectrum method in order to derive the capacity of a general MAC 
without feedback, when the channel transition probabilities are arbitrary for every n symbols. Han also considered 
the additive mod-q MAC, which we shall use here to illustrate the way in which our general results characterize 
special cases of interest. In particular, our results will imply that feedback does not increase the capacity region of 
the additive mod-q MAC. 

In this work, we consider the capacity region of the Finite-State Multiple Access Channel (FS-MAC), with 
feedback that may be an arbitrary time-invariant function of the channel output samples. We characterize both an 
inner and an outer bound for this region. We further show that these bounds coincide, and hence yield the capacity 
region, for the important subfamily of FS-MACs with states that evolve independently of the channel inputs. Our 
derivation of the capacity region is rooted in the derivation of the capacity of finite-state channels in Gallager's 
book [17, ch 4,5]. More recently, Lapidoth and Telatar [18] have used it in order to derive the capacity of a 
compound channel without feedback, where the compound channel consists of a family of finite-state channels. In 
particular, they have introduced into Gallager's proof the idea of concatenating codewords, which we extend here 
to concatenating code-trees. 

Though 'multi-letter' in general, our results yield explicit conclusions when applied to more specific families 
of MACs. For example, we find that feedback does not increase the capacity of the mod-g additive noise MAC 
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(where q is the size of the common alphabet of the input, output and noise), regardless of the memory in the 
noise. This result is in sharp contrast with the finding of Gaarder and Wolf in [19] that feedback can increase the 
capacity even of a memoryless MAC due to cooperation between senders that it can create. Our result should also 
be considered in light of Alajaji's work [20], where it was shown that feedback does not increase the capacity of 
discrete point-to-point channels with mod-q additive noise. Thus, this part of our contribution can be considered 
a multi-terminal extension of Alajaji's result. Our results will in fact allow us to identify a class of MACs larger 
than that of the mod-g additive noise MAC for which feedback does not enlarge the capacity region. 

Further specialization of the results will allow us to deduce that, for a general FS-MAC with states that are 
not affected by the input, if the capacity (region) without feedback is zero, then so is the capacity (region) with 
feedback. It will also allow us to identify a large class of FS-MACs for which source-channel coding separation 
holds. 

The remainder of this paper is organized as follows. We concretely describe our channel model and assumptions 
in Section [II] In Section [Til] we introduce some notation, tools and results pertaining to directed information and the 
notion of causal conditioning that will be key in later sections. We state our main results in Section[IV] In Section W\ 
we apply the general results of Section [IV] to obtain the capacity region for several interesting classes of channels, 
as well as establish a source-channel separation result. The validity of our inner and outer bounds is established, 
respectively, in Section [VI] and Section IVIII In Section IVIIII we show that our inner and outer bounds coincide, 
and hence yield the capacity region, when applied to the FS-MAC without feedback. This result can be thought 
of as the natural extension of Gallager's results [17, Ch. 4] to the MAC or, alternatively, as the natural extension 
of Gallager's derivation of the MAC capacity region in [21] to channels with states. In Section HXl we characterize 
the capacity region for the case of arbitrary (time-invariant) feedback and FS-MAC channels with states that evolve 
independently of the input, as well as the FS-MAC with limited ISI (which is the natural MAC-analogue of Kim's 
point-to-point channel [22]), by showing that our inner and outer bounds coincide for this case. We conclude in 
Section |X] with a summary of our contribution and a related future research direction. 

II. Channel Model 

In this paper, we consider an FS-MAC (Finite state MAC) with a time invariant feedback as illustrated in Fig. [TJ 
The MAC setting consists of two senders and one receiver. Each sender I £ {1, 2} chooses an index mi uniformly 
from the set {1, 2 nRl } and independently of the other sender. The input to the channel from encoder I is 
denoted by {Xn, X12, X13, ...}, and the output of the channel is denoted by {Yi, Y2, Y3, •••}■ The state at time i, 
i.e., Si £ S, takes values in a finite set of possible states. The channel is stationary and is characterized by a 
conditional probability P(yi, Si\xu, %m, Sj-i) that satisfies 

P(yi, Si\x\,x\, s* -1 ,^ -1 ) = P(yi, Si\x u ,X2i, Sj-i), (1) 

where the superscripts denote sequences in the following way: x\ = (xn, X12, xu), I £ {1,2}. We assume a 
communication with feedback z\ where the element Zu is a time-invariant function of the output yi. For example, 
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Fig. 1. Channel with feedback that is a time invariant deterministic function of the output. 



zu could equal t/j (perfect feedback), or a quantized version of y if or null (no feedback). The encoders receive the 
feedback samples with one unit delay. 

A code with feedback consists of two encoding functions gi : {1, 2 nRl } x Z™ -1 — ► Xp, I = 1, 2, where the 
fct/i coordinate of xf G A 7 ™ is given by the function 

xik = 5ifc(m i ,zf~ 1 ), fc = l,2,...,n, Z = l,2 



and a decoding function, 

3 : y n -> {l,...,2 nfll } x {l,...,2 nR2 }. 
The average probability of error for ((2 nRl , 2 ni?2 , n) code is defined as 

= on(R 1+ R 2) E Pr {.9(^") ^ K,^)!^!,^) sent}. 



(2) 



(3) 



(4) 



A rate (Ri,R 2 ) is said to be achievable for the MAC if there exists a sequence of {(2 nRl , 2 nR2 ), n) codes with 
0. The capacity region of MAC is the closure of the set of achievebale (Ri,R 2 ) rates. 



III. Directed Information 

Throughout this paper we use the Causal Conditioning notation (-||-). We denote the probability mass function 
(pmf) of Y N causally conditioned on X N ~ d , for some integer d > 0, as P(y N \\x N ~ d ) which is defined as 



N 



P(y N \\x N - d ) ±l[P(y i \y i -\x i - d ), 



i=l 



(if i — d < then x l d is set to null). In particular, we extensively use the cases where d = 0, 1: 



N 



(5) 



(6) 
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N 

Q{x N \\y N - l )^J{Q{x i \x i -\y i - 1 ), (7) 

i=l 

where the letters Q and P are both used for denoting pmfs. 

Directed information I(X N — > Y N ) was defined by Massey in [23] as 

N 

I(X N ^Y N ) ^^/pp^Y*- 1 ). (8) 

8=1 

It has been widely used in the characterization of capacity of point-to-point channels [22], [24]-[29], compound 
channels [30], network capacity [14], [31], rate distortion [32]-[34] and computational biology [35], [36]. Directed 
information can also be expressed in terms of causal conditioning as 

N 

I(X N ^Y N ) = J2 7 ( x '; YilY*- 1 ) = E 

i=l 

where E denotes expectation. The directed information from X N to Y N , conditioned on S, is denoted as I(X N — > 
y^jiS 1 ) and is defined as: 

N 

I{X N ^Y N \S)=^I{X l ;Y l \Y t -\S). (10) 
t=i 

Directed information between X± to causally conditioned on X^ 1 is defined as 

JV 

I(X? ^Y N \\X?)±Y, I ( X i> Y i\ x l Yi ~ 1 ) = ' E 
i=l 

where P(y Ar | |xf , ) = flti WlT 1 , 4, 4)- 

Throughout this paper we are using several properties of causal conditioning and directed information that follow 
from the definitions and simple algebra. Many of the key properties that hold for mutual information and regular 
conditioning carry over to directed information and causal conditioning, where P(x N ) is replaced by P(x N \\y N ~ 1 ) 
and P(y N ) is replaced by P(y N \\x N ). Specifically, 

Lemma 1: {Analogue to P(x^ ,y N ) = P(x^)P(y N \x^).) For arbitrary random vectors (Xi,X%,Y), 



P(x?,y N )=P(x?\\y N - 1 )P(y N \\x?) (12) 

P(x?,y"\\x?)=P(x?\\y»-\x?)P(v tf \\x?,x?). (13) 
Lemma 2: (Analogue to \I(X^; Y ) — I(X^; Y N \S)\ < H(S).) For arbitrary random vectors and variables, 

\l(X N ->Y N ) - I(X? ^>Y N \S)\ <H{S) <log|5| (14) 
\I(X? -> Y N \\X») - I(X N - Y N \\X», 5)| < H(S) < log|5|. (15) 



The proofs of Lemma[TJand Lemma|2]can be found in [27, Sec. IV], along with some additional properties of causal 
conditioning and directed information. The next lemma, which is proven in Appendix U shows that by replacing 
regular pmf with causal conditioning pmf we get the directed information. Let us denote the mutual informa- 
tion I(X?;Y n \X$) as a functional of Q(x?,x$) and P(y N \x?,x%), i.e., T(Q{xf ,x$ );P(y N \x? ,a;f)) = 



log 



P (Y»\\X»y 



P(Y N ) 



(9) 



log 



P(Y N \\X{\X»)^ 
P(Y N \\X 9 N ) 



(11) 
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I(X^; Y n \X^'). Consider the case that the random variables X^,X^ are independent, i.e., Qix^ ,x^) 
Q{xi)Q(x 2 ), then by definition 



I(Q(x?)Q{x?);P(y N \x?,x?))± V Q(x?)Q(x?)P(y N \x?,x») - ;/» '"V^'.at (16) 



P(y N \x?,xp 



Lemma 3: If the random vectors and X^ are causal-conditionally independent given Y N 1 , i.e., 

Q(af .a^Hi^- 1 ) = Q(x^\\y N - 1 )Q(,x^\\y N - 1 ) then 

T(Q(<||^- 1 )Q(^||^- 1 );P(^||<,<))=/(Xf ^Y»\\X»). (17) 
The next lemma, which is proven in Appendix [II] shows that in the absence of feedback, mutual information 
becomes directed information. 

Lemma 4: If Q(xf , xgWy"- 1 ) = Q(x?)Q(x%) then 

I{X?-Y N \X%) = J(Xf -» F^HXf). (18) 
IV. Main Theorems 

We dedicate this section to a statement of our main results, proofs of which will appear in the subsequent sections. 
Let TZ n denote the following region in Ml (2D set of nonnegative real numbers): 



Q{w)Q(xf\\z^ \u>)Q(xgi|zJ _1 ,iu) 



Rt < min S0 -► W,* ) - ™, 

fli < min S0 i/(X 2 « - y^HXf, W,« ) - ™, d9) 

fix + R 2 < mm sa ±I{{X x ,X 2 ) n - Y n |W,s ) - ™- 

Having the auxiliary random variable is equivalent to taking the convex hull of the region. It is shown in 
the Appendix that the inclusion (or omission) of W in the definition of the region TZ n has vanishing effect with 
increasing n. 

Theorem 5: (Inner bound.) For any FS-MAC with time invariant feedback as shown in Fig. Q] and for any integer 
n > 1, the region TZ n is achievable. 
Let lZ n denote the following region in M.+ 

' Ri < $i(x? -^Y n \\xz), 

Ri < ±I(X2 ^Y n \\X?), (20) 
Ri + Ra < ^I((Xi,X 2 ) n — > Y n ). 

In the following theorem we use the standard notion of convergence of sets. Confer Appendix IIV I for the details of 
the definition. 

Theorem 6: {Outer bound.) Let (Ri,R 2 ) be an achievable pair for a FS-MAC with time invariant feedback, 
as shown in Fig. [T] Then, for any n there exists a distribution Q(xi\\z™~ 1 )Q(x 2 l \\z2 ~ 1 ) such that the following 
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inequalities hold: 

Ri < -I(X?^Y n \\X2)+e n 
n 

R2 < -I(X2^Y n \\X?)+e n 
n 

R1 + R2 < -I((X ll X 2 ) n ->Y n )+e ni (21) 
n 

where e„ goes to zero as n goes to infinity. Moreover, the outer bound can be written as lim inf lZ n . 

For the case where there is no feedback, i.e., Z{ is null, lZ n and TZ n can be expressed in terms of mutual 
information and regular conditioning due to Lemma [4] 

Theorem 7: {Capacity of FS-MAC without feedback.) For any indecomposable FS-MAC without feedback, the 
achievable region is linin^oo lZ n , and the limit exists. 

Theorem 8: (Capacity of FS-MAC with feedback.) For any FS-MAC of the form 

P{yi,s i \x ll ,x 2 ,i,Si- 1 ) = P(s l \s l _ 1 )P(y i \x li ,x 2 , i ,s l - 1 ), (22) 

where the state process Si is stationary and ergodic, the achievable region is lirrin^oo lZ n , and the limit exists. 

The next theorems will be seen to be consequences of the capacity theorems given above. 

Theorem 9: For the channel described in (l22i >. where the state process Sj is stationary and ergodic, if the capacity 
without feedback is zero, then it is also zero in the case that there is feedback. 

Corollary 10: For a memoryless MAC, the capacity with feedback is zero if and only if it is zero without 
feedback. 

Corollary 11: Feedback does not enlarge the capacity region of a discrete additive (mod- 1^1) noise MAC. 
In fact, among other results, we will see in the next section that the (mod-l^l) noise MAC is only a subset of a 
larger family of MACs for which feedback does not enlarge the capacity region. 

V. Applications 

The capacity formula of a FS-MAC given in Theorems [7] and [8] is a multi-letter characterization. In general, it 
is very hard to evaluate it but, for the finite state point to point channel, there are several cases where the capacity 
with and without feedback was found numerically [37] [38], [26], [25] and analytically [28]Q 

The multi-letter capacity expression is also valuable for deriving useful concepts in communication. For instance, 
in order to show that feedback does not increase the capacity of a memoryless channel (cf. [43]), we can use 
the multi-letter upper bound of a channel with memory. Further, in [27] it was shown that for the cases where 
the capacity is given by the multi-letter expression C — limjv^oo -k maxQ(. z .N|| 2 iv-i) I(X N — > Y N ), the source- 
channel coding separation holds. It was also shown that if the state of the channel is known at both the encoder and 
decoder and the channel is connected (i.e., every state can be reached with some positive probability from every 
other state under some input distribution), then feedback does not increase the capacity of the channel. 

'For the Gaussian case without feedback there exists the water filling solution [39], and recently the feedback capacity was found analytically, 
for the case that the noise is an ARMA( 1 )-Gaussian process (cf. [40]-[42]). 
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In this section we use the capacity formula in order to derive three conclusions: 

1) For a stationary ergodic Markovian channels, the capacity is zero if and only if the capacity with feedback 
is zero. 

2) Identify FS-MACs that feedback does not enlarge the capacity and show that for a MAC that can be 
decomposed into a 'multiplexer' concatenated by a point-to-point channel (with, without, or with partial 
feedback), the capacity region is given by ^2 m R m < C, where C is the capacity of the point to point 
channel. 

3) Source-channel coding separation holds for a MAC that can be decomposed into a 'multiplexer' concatenated 
by a point-to-point channel (with, without, or with partial feedback). 

As a special case of the second concept we show that the capacity of a Binary Gilbert-Ellliot MAC is Rx + R% < 
1 — H(V) where V is the entropy rate of the hidden Markov noise that specifies the Binary Gilbert-Ellliot MAC. 

A. Zero capacity 

The first concept is given in Theorem [9] and is proved here. The proof of Theorem [9] is based on the following 
lemma which is proven in Appendix 11111 

Lemma 12: For a MAC described by an arbitrary causal conditioning p(y n \ |x™, x% ) the following holds: 

max I(X?, X? -> Y n ) = <^> max I(X?, X? -> Y n ) = 0, (23) 

Q(xnv n - 1 )Q(^\\v n - 1 ) Q(z?)Q(zJ) 

and each condition also implies that P(y n \ |x", x 1 ^ ) = P(y n ) for all x^x^. 
Proof of Theorem® Since the channel is a Markovian channel, i.e., 

P(yi,Si\xij,x 2 ,uSi-i) =p(s i \a i - 1 )P(yi\xij,X2,us i -i) (24) 

and stationary and ergodic, its capacity region is given in Theorem [8] as C — lirrin^oo TZ n . Furthermore, since 
the sequence {1Z„} is sup-additive (Lemma l22l . then according to Lemma [23] that is given in Appendix [IV] 
lim^oo TZ n = cl UJn>i ^n)> implying that if the capacity without feedback is zero, then for all n > 1 

max I(X". X? Y n ) = 0. (25) 



According to Lemma [121 the maximization of the objective in eq. ( 1251 ) over the distribution 
Q(xi\\y n ~ 1 )Q(x2 Hy™" 1 ) is still zero, hence, the capacity region is zero even if there is perfect feedback. 
■ 

Corollary [10] which states that the capacity of a memoryless MAC without feedback is zero if and only if the 
capacity with feedback is zero, follows immediately from Theorem[9]because a memoryless MAC can be considered 
a FS-MAC with one state. 

Clearly, Theorem [9] also holds for the case of a stationary and ergodic FS-Markov point-to-point channel because 
a MAC is an extension of a point-to-point channel. However, it does not hold for the case of a broadcast channel. 
For instance, consider the binary broadcast channel given by y\^ = x®m and y<i,i — x®rii-i, where m is an i.i.d 
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Bernoulli(i) and © denotes addition mod-2. The capacity without feedback is clearly zero, but if the transmitter 
has feedback, namely if it knows yt^-t and 2/2,1-1 at time i, then it can compute the noise rij_i = yi,i-i ajj-i 
and therefore it can transmit 1 bit per channel use to the second user. 



B. Examples of channels for which feedback does not enlarge capacity 



1 - a 




V ~ Bernouli(pG) V ~ Bernouli(p B ) 




Fig. 2. Gilbert-Elliot Mac. It has two states, "Good" and "Bad" where the transition between them is according to a first order Markov process. 
Given that the channel is in a "Good" (or a "Bad") state, it behaves as binary additive noise where the noise is Bernoulifpg ) ( or Bernouli(pfl)) 



1) Gilbert- Elliot MAC: The Gilbert-Elliot channel is a widely used example of a finite state channel. It is often 
used to model wireless communication in the presence of fading [37], [38], [44]. The Gilbert-Elliot is a Markov 
channel with two states, denoted as "good" and "bad". Each state is a binary symmetric channel and the probability 
of flipping the bit is lower in the "good" state. In the case of the Gillber-Elliot MAC (Fig. |2j, each state is an 
additive MAC with i.i.d noise, where in the "good" channel the probability that the noise is '1' is lower than in 
the bad channel. This channel can be represented as an additive MAC as in Fig. [2] where the noise is a hidden 
Markov process. 

Since the Gilbert-Elliot MAC is an ergodic FS-MAC, its capacity with feedback when the initial state distribution 
over the states "good" and "bad" is the stationary distribution is given by lirrin^oo lZ n (Theorem^. For the Gilbert 
Elliot MAC, the region lirrin^oo 7Z n reduces to the simple region, 

Ri + R 2 <l-H(V), (26) 

where H (V) denotes the entropy rate of the hidden Markov noise. The following equalities and inequalities upper 
bound the region 7Z n and this upper bound can be achieved for any deterministic feedback by an i.i.d input 
distribution Xij ~ Bernoulli^) and X2,i ~ Bernoulli(i), i = 1,2, n and X{ 1 and X% are independent of each 
other. 
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I((X 1 ,X 2 ) n ^Y n ) 



n 

= ^HiYilY^-HiVilV 



■ 1 i*- 1 ,*',**) 



^iTCyi^- 1 )-^!^- 1 ) 



< ^io g 2-i?(y J |^- 1 ) 



(27) 



Equality (a) is due to the facts that y\ is a function of (i^, Xi t i, X2,i) and i!j is a deterministic function of 
(%) Xi i, X2 1 i), i.e. 2/j = X\ i © X2 ; i © Vi and = j/j © X\ j © a^i- Equality (b) follows from the fact that Uj 
is independent of the messages. Inequality (c) is due to the fact that the size of the alphabet 3^ is 2. Similarly 



-» F"||X 2 ") < 1 



and equality is achieved with an 



and i/(X 2 " -► Y n \\X?) < 1 - 
i.i.d input distribution Bernoulli(i). Finally, by dividing both sides by n and using the definition of entropy rate 
il(V) = linin^oo —H(V n ) we conclude the proof. 



2) Multiplexer followed by a point-to-point channel: Here we extend the Gilber-Elliot MAC to the case where 
the discrete MAC can be decomposed into two components as shown in Fig. [3] The first component is a MAC 
that can behave as a multiplexer and the second component is a point-to-point channel. The definitions of those 
components are the following: 



Delay 



Wi 



-XuiWi.Y"- 1 ) 



Multiplexer 
MAC 



Xn 



point-to-point 
channel 



Yi 



Delay 



Fig. 3. Discrete MAC that can be decomposed into two parts. The first part is a MAC that behaves as a multiplexer and the second part is a 
point-to-point channel 



Definition 1: A MAC behaves as a multiplexer if the inputs and the output have common alphabets and for all 
m 6 1,...,M there exists a choice of input symbols for all senders except sender m, such that the output is the 
rath input, i.e. Y = X m . 
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An example of a multiplexer-MAC for the Binary case is a MAC whose output is one of and/or/xor of the inputs. 
For a general alphabet q those operations could be max/min/addition-mod-g. For instance, if the channel is binary 
with two users and it is addition-mod-2, i.e., y = x\ © x%, then we can ensure that y = x\ by choosing X2 — 0. 

Theorem 13: The capacity region of a multiplexer MAC followed by a point-to-point channel with a time invariant 
feedback to all encoders, as shown in Fig. [3j is 



M 

E 

m—1 



Rm < C 



(28) 



where C is the capacity of the point-to-point channel with the time invariant feedback z i _i(y i _ 1 ). 

Proof: The achievability is proved simply by time sharing. At each time, only one selected user sends 
information and the other users send a constant input that insures that the output is the input of the selected user. 

The converse is based on the fact that the maximum rate that can be transmitted through the point-to-point 
channel is C and it is an upper bound sum-rate of multiplexer-MAC. If it hadn't been an upper bound for the 
multiplexer-MAC, we could build a fictitious Multiplexer-MAC before the point-to-point channel and achieve by 
that a higher rate than its upper bound which would be contradiction. ■ 

3) Discrete additive MAC: An immediate consequence of Theorem [131 is an extension of Alajaj's result [20] to 
the additive MAC which is given in Corollary QT| Corollary QT| states that feedback does not enlarge the capacity 
region of a discrete additive (mod- 1^1) noise MAC. 

The proof of the corollary is based on the following observation. If feedback does not increase the capacity of 
a particular point-to-point channel then feedback also does not increase the capacity of the MUX followed by the 
same particular channel. Specifically, feedback does not increase the achievable region of an additive MAC (Fig. 
21 and the achievable region is given by 



M 

E 

m—1 



Rm <logq-H(V), 



(29) 



where H (V) is the entropy rate of the additive noise. 



Wi— ►*ift(Wi) 



Wm — *X Mn (W M ) 



delay 




W M — *~X M n{WM,Y n - 1 )- 



delay 



Fig. 4. Additive noise MAC with and without feedback. The random variables X\ n , ...,X^i n , Y n , V n , n £ 1, 2, 3, are from a common 
alphabet of size q, and they denote the input from sender 1,..,,M, the output and the noise at time n, respectively. The relation between the 
random variables is given by y n = x\ n © x% n ... © %Mn ffi v n where © denotes addition mod-q. The noise V n , possibly with memory, is 
independent of the messages W\, Wm- 
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4) Multiplexer followed by erasure channel: Consider the case of the multiplexer-erasure MAC which is a 
multiplexer followed by an erasure channel, possibly with memory. 

Definition 2: A point-to-point channel is called erasure channel if the output at time n can be written as Y n — 
f(X n , Z n ), and the following properties hold: 

1) The alphabet of Z is binary and the alphabet of Y is the same as X plus one additional symbol called the 
erasure. 

2) The process Z n is stationary and ergodic and is independent of the message. 

3) If z„ = 0, then y n = x n and if z n = 1, then the output is an erasure regardless of the input. 
For the mutltiplexr-erasure channel we have the following theorem. 

Corollary 14: The capacity region of the multiplexer-erasure MAC with or without feedback is 

M 

Y / Rm<(l-Pe)\0gq, (30) 
m— 1 

where p e is the marginal probability of having an erasure. Moreover, even if the encoder has non causal side 
information, i.e. the encoders know where the erasures appear noncausally, the capacity is still given by ( |30l . 
Proof: According to Theorem Q~3] the capacity region is 

M 
m— 1 

where C is the capacity of the erasure point-to-point channel. Diggavi and Grossglauser [45, Thm. 3.1] showed 
that the capacity of a point-to-point erasure channel, with and without feedback, is given by (1 — p e ) logq. Since 
the probability of having an erasure does not depend on the input to the channel, we deduce that even in the the 
case where the encoder knows the sequence Z n non-causally, which is better than feedback, the transmitter can 
transmit only fraction 1 — p e of the time, hence the capacity cannot exceed (1 — p e ) logq. ■ 

5) Multiplexer followed by the trapdoor channel: In this example feedback increases the capacity. Based on the 
fact that the capacity of the trapdoor channel with feedback [28] is the logarithm of the golden ratio, i.e. log v ^ +1 , 
the achievable region of a Multiplexer followed by the trapdoor channel is 

f>™<log^i. (32) 

m— 1 

C. Source-channel coding separation 

Cover, El-Gamal and Salehi [46] showed that, in general, the source channel separation does not hold for MACs 
even for a memoryless channel without feedback. However, for the case where the MAC is a discrete Multiplexer 
followed by a channel we now show that it does hold. 

We want to send the sequence of symbols U",!/^ 1 over the MAC, so that the receiver can reconstruct the 
sequence. To do this we can use a joint source-channel coding scheme where we send through the channel the 
symbols xi t i(v,i, z l ~ v ) and £2.1(^2 , z 1 ^ 1 ). The receiver looks at his received sequence Y n and makes an estimate 
17™, U%. The receiver makes an error if f7™ ^ {7™ or if ^ U^, i.e., the probability of error P e (n) is P e (n) = 
Pr((£f,Zy 2 ")^(E/?,C/ 2 ")). 
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Theorem 15: (Source-channel coding theorem for a Multiplexer followed by a channel.) Let {Ui,U2) n >i be a 
finite alphabet, jointly stationary and ergodic pair of processes and let the MAC channel be a multiplexer followed 
by a point-to-point channel with time invariant feedback and capacity C — limjv^oo -h maxg^Tii^n-i) I(X n ; Y n ) 
(e.g., a memoryless channel, an indecomposable FSC without feedback, stationary and ergodic Markovian channel). 
For the source and the MAC described above: 

(direct part.) There exists a source- channel code with P e (ll) -> 0, if H(Ui,U 2 ) < C, where B.{UxM%) is the 
entropy rate of the sources and C is the capacity of the point-to-point channel with a time-invariant feedback. 

(converse part). If H(Ui,U2) > C, then the probability of error is bounded away from zero (independent of the 
blocklength). 

Proof: The achievability is a straightforward consequence of the Slepian-Wolf result for Ergodic and stationary 
processes [47] and the achievability of the multiplexer followed by a point-to-point channel. First, we encode the 
sources by using the Sepian-Wolf achievability scheme where we assign every u™ to one of 2 nRl bins according 
to a uniform distribution on {1, 2 nRl } and independently we assign every vJj: to one of 2 nR2 bins according to 
a uniform distribution on {1, 2 ni?2 }. Second, we encode the bins as if they were messages, as shown in Fig. [5] 

(n) 

In the converse, we assume that there exists a sequence of codes with P e — > 0, and we show that it implies 
that H(Ui,U2) < C. Fix a given coding scheme and consider the following: 



H(U?,U$) < I(U?,U2;U?,U2)+ne n 
< I(U^U^;Y n )+ne n 



(c) 



(«/) 



ne r , 



H(Y n ) — H(Y n \U", U%) + ne n 

n 

Y.HiYlY 1 - 1 ) - HiYp^U^Y 1 - 1 ) + ne n 

i=l 
n 

Y.HiYlY^-HiYp^U^Y^^lX^) 

i=l 
n 

^^|F i - 1 )-^|F i - 1 ,Xj I Xi)+ne„ 

i=l 
n 

i=l 

n 

= ^/(xi,^ ; y l |r J - 1 ) + ne„ 

i=l 
(e) ™ 

< ^7(x^y i |y i - 1 ) + ne„ 

i=l 

= I(XZ^Y n )+ne n 

< max I(Xl l -> Y n ) + ne n (33) 

Q(x«||*»-i) 

Inequality (a) is due to Fano's inequality where ne n = 1 + Pe n\Ui\\U2\. Inequality (b) follows from the data 
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processing inequality because (i/jf , U%) — Y N — (U^ , U^) form a Markov chain. Equality (c) is due to the fact 
that, for a given code, X\ is a deterministic function of U", Y 1 ^ 1 and, similarly, X\ is a deterministic function of 
U^ l ,Y l -\ Equality (d) is due to the Markov chain (Uf,U^) - (X{, X\, Y^ 1 ) - Y t . The notation X 0ti denotes 
the output of the multiplexer which is also the input to the point-to-point channel at time i. The inequality in (e) is 
due to the data processing inequality which can be invoked thank to the fact that given Y 1 ^ 1 we have the Markov 



chain X{,Xi-Xi-Yi. 

By dividing both sides of d33l by n, taking the limit n 

linin-oo imaxQ (x n|| z n-i) I(X n \Y n ) we have 



oo, and recalling that C 



HQ4 u Ui)= lim -H(U?,U?)<C. 

n — >oo fl 



(34) 



Delay 



U?—~- Wi(U?) —*X li (W 1 ,Y n - 1 )- 
6 {1,...,2"&} 

U.?— - W 2 (U?) — - X 2l (W 2: Y'- 1 )- 
6 {1,...,2"« 2 } 



Multiplexer 
MAC 



point-to-point 
channel 



Delay 



-Yi 



W t {Y n ) £/J*(Wi, W 2 ) 

w 2 (Y n ) us(Wi,m) 



Fig. 5. Source-channel coding separation in a discrete Multiplexer followed by a point-to-point channel. 



VI. Proof of Achievability (Theorem|5]) 

The proof of achievability for the FS-MAC with feedback is similar to the proof of achievability for the point- 
to-point FSC given in [27, Sec. V], but there are two main differences: 

1) In the case of FSC, only one message is sent, and in the case of FS-MAC, two independent messages are 
sent, which requires that we analyze three different types of errors: the first type occurs when only the first 
message is decoded with error, the second type occurs when only the second message is decoded with error, 
and the third type occurs when both messages are decoded with error. 

2) In both cases, we generate the encoding scheme (code-trees) randomly but the distribution that is used is 
different. In the case of FSC we generate, for each message in [1, 2 NR ], a code-tree of length N by using 
the causal conditioning distribution Q*(x N \\z N ~ 1 ) — argmaxQ^Jviuw-i) min So I(X N — > Y n \sq), and here 
we generate for each message in [1, ...,2 NRl ],l = 1,2 a code-tree of length N = Kn by concatenating K 
independent code-trees where each one is created with a causal conditioning distribution Q(xf\ |z™ _1 ), I = 1, 2. 

Encoding scheme: Randomly generate for encoder {I £ 1,2}, 2^' code-trees of length N = Kn by drawing 
it with the fixed distributions Q(xf \ |z" _1 ). In other words, given a feedback sequence z^ 1 the causal conditioning 
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probability that the sequence will be mapped to a given message is 

K 

Q(xf H*?- 1 ) = J] Q(x^ {k _ 1)n+1 \\z^ k _ 1)n+1 ), 05) 
fc=i 

where ^i" fe _ 1 ) n+1 denotes the vector (xi^k-ijn+i, x x,{k-i)n+2i ■■■■> x i,kn)- Fig. [6] illustrates the concatenation 
of trees graphically. In order to shorten the notation we will sometimes use the notation Qm to denote 
Q{x±\ \z^ r ~ 1 )Q(x 2 v \ \z2~~ 1 ) and we will express the concatenation of pmfs in (l35l l as Qjy — Ylk=i Qn- 

codeword (case of no feedback) code-tree (used in [27]) concatenated code-tree (used here) 




i = 1 i = 2 i = 3 i = l i = 2 i = 3 i = 1 « = 2 i = 3 « = 4 •—►•(no feedback) 

Fig. 6. Illustration of coding scheme for setting without feedback, setting with feedback as used for point-to-point channel [27] and a code-tree 
that was created by concatenating smaller code-trees. In the case of no feedback each message is mapped to a codeword, and in the case of 
feedback each message is mapped to a code-tree. The third scheme is a code-tree of depth 4 created by concatenating two trees of depth 2. 



Decoding Errors: For each code in the ensemble, the decoder uses maximum likelihood decoding and we want 
to upper bound the expected value E[P e ] for this ensemble. Let P e i,P e 2,P e 3 be defined as follows. 

Pel (type 1 error): probability that the decoded pair (mi, 7712) satisfies mi 7^ mi,tfi2 = 77i 2 , 
Pe2 (type 2 error): probability that the decoded pair (7711,7712) satisfies m\ = mi, 7712 7^ 7712, 
Pe3 (type 3 error): probability that the decoded pair (77x1,777,2) satisfies ih\ ^m\,mi 7^ m 2 . 

Because the error events are disjoint we have 

P e = P e i + P e2 + P e3 (36) 

In the next sequence of theorems and lemmas, we upper bound the expected value of each error type and show that 
if (Ri,R,2) satisfies the three inequalities that define TZ n then the corresponding E[P e j],i = 1,2,3 goes to zero 
and hence E[P e ] goes to zero. 

Theorem 16: Suppose that an arbitrary message mi, 7712, 1 < mi < Mi, 1 < m-2 < M2, enters the encoder with 
feedback and that ML decoding is employed. Let E[P ei \mi,m 2 ) denote the probability of decoding error averaged 
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over the ensemble of codes when the messages mi,m 2 were sent. Then for any choice of p, < p < 1, 

i+p 



E[P ei \m 1 ,m 2 ] < (Mi-1)> £ Q{ 



v[p e2 \ mi ,m 2 ] < (M 2 -iy J2 QitfW*?' 1 ) 



xf 



i+p 



(37) 



(38) 



E[P e3 \ mi ,m 2 ] < ((M 2 -1)(M 2 -1)YY^ 



J2 Q^\\zr i )Q(^\\ Z N - 1 )P(y N \\^,^)^ 



1+p 



(39) 



The proof is given in Appendix I VII and is similar to [27, Theorem 9] only that here we take into account the fact 
that there are two encoders rather than one. 

Let P e i(so), i = 1, 2, 3 be the probability of error of type i given that the initial state of the channel is sq. Also 
let Ri — A log Mi and R 2 = log M 2 be the rate of the code and R$ be the sum rate, i.e. R% = R\ + R 2 . The 
following theorem establishes exponential bounds on E[P e j(so)]- 

Theorem 17: The average probability of error over the ensemble, for all initial states so, and all p, < p < 1, 
is bounded as 

E[P el ( So )K,m 2 ] < \S\2^ n ^p r ^ Fn -^ Qn ^\ i = 1,2,3 (40) 



where 



P log \S\ 

N 



+ 



min E N ,i(p, Qn,s ) 

so 



i = 1,2,3 



E N ,i{p, Qn,s ) = -^ lo s Y 

y ? x 2 

E N , 2 (p,Q N ,s ) = -^log Yl Q( x i\\ z i^) 



^Q(x?\\z?- 1 )P( y > f \\x?,x?,8 )Taa 

xf 



1+p 



1+p 



1+p 



(41) 



(42) 



(43) 



J2 Q(x?\\z?- 1 )Q(x»\\z N - l )P(y N \\x?,x»,s )T^ . (44) 

The proof is based on algebraic manipulation of the bounds given in (I37li-([39l. It is similar to the proof of Theorem 
9 in [27] and therefore omitted. There are two differences between the proofs (and both are straightforward to 
accommodate): Here the input distribution Qn = Q{ x \ \\ Z \)Q{ X 2 \\ z 2 ) i s arbitrary while in [27] we chose the 
one that maximizes the error exponent. Second, here we bound the averaged error over the ensemble and in [27] we 
have an additional step where we claim that there exists a code that has an error that is bounded by the expression 
in ( |40b . Because of this difference the bound on the probability of error in [27] has an additional factor of 4. 

The following theorem presents a few properties of the functions Em,i{p, Qn,so), i = 1,2,3, such as positivity 
of the function and its derivative, convexity with respect to p, and an upper bound on the derivative which is 
achieved for p = 0. 
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Lemma 18: The term En,i(p, Qn, sq) has the following properties: 

En,%{Pi Qn, so) > 0; p>0,i=l,2,3, (45) 



±I(X?^Y"\\X?,so) > 



dE N ,i{p, Q N 


,so) 


>0; 


P > o 


dp 




dE Nj2 (p, Qn 


,s ) 


>0; 


P > o 


dp 




dE Nt3 (p, Q N 


,s ) 


>0; 


P > o 


dp 





±I(X?,X? ^Y N \s ) > ' ' N -I): .-0 (46) 



d 2 E N .j(p, Q N , go) 

7j-2 > 0; p>0,« = l,2,3. (47) 

Furthermore, equality holds in ( |45T > when p = 0, and equality holds on the left sides of eq. (l46l l when p = for 

i = 1,2,3. 

The proof of the theorem is the same proof as [21, eq. (2.20)], [17, Theorem 5.6.3]. In [21] the arguments Qn of 
En,i(p, Qn, sq) are regular conditioning i.e., Q(xi)Q(x2 ), and the channel is given by P(y N \x^ , x 2 , so), hence 
the derivative of En,i{p, Qn, so) with respect to p is upper-bounded by I(Xf;Y N \X 2 N ,so). Here we replace 
QOf )Q(x?) with Qix^Wz^-^Qix^Wz?- 1 ) and P(y N \x? ,x$ , s ) with P(y N \\x? ,x%, s ) and, according to 
Lemma [3] the upper-bound becomes I(X^ — > Y^l \X 2 , sq). The next lemma establishes the sup-additivity of 
FnAPiQn)^ = 1,2,3. 

Lemma 19: Sup-additivity of Fn,i{p,Qn)- For any finite-state channel, Fn,i{p,Qn), as given by eq. fill , 
satisfies 

n I 

F n+ u(p,Qn+i)>— 1 F n 4p,Q n ) + —-F l 4p,Q l ), i = 1,2,3. (48) 
The proof steps are identical to the proof of the sub-additivity for the point-to-point channel [27, Lemma 11]. 
Invoking this lemma on the pmf Qn = IIfc=i Qn where N = nK we get 

F N ,i(p, Q N ) > K-F n/i (p, Q n ) = F nA (p, Q n ). (49) 

Let us define 

C N>1 (Q N ) = ±:minI(X N -+Y N \\X?,8 ) (50) 

1\ so 

C N2 (Qn) = ±wmI(X? -^Y N \\X?, S o) (51) 

iV so 

CnAQn) = -^minliX^X? ^Y N \s ) (52) 

iV so 

where the joint distribution of Xf,X^,Y N conditioned on so is given by P(x± , x 2 , y N \so) = 
Q(x?\\z?- 1 )Q{x?\\z?- l )P( V N \\x?,x?,a ). 
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Theorem [5] (inner bound) given in Sec. [IV] states that for every n and < P; < C_ n i(Q n ) — „ , i = 1, 2,3 
(recall, R3 = Ri + R2) and every i] > there exists an N and an (JV, [2 iV '- Rl ], [2 iV - Rl ]) code with a probability 
of error P e (so) (averaged over the messages) that is less than r\ for all initial states so- 

Proof of Theorem^ The proof consists of the following three steps: 

• Showing that for a fixed n if R4 < C_ n i(Q n ) — 1 ° S J' S ^ 1 i = 1, 2, 3 then there exists p* such that, 

F n>i {p%Q n )- p*Ri >0, i=l,2,3. (53) 

• We choose e < mic^g/x 2,3} Fn,i(p*> Qn) — P* Ri and show that for sufficiently large iV 

E[P ei ( S0 )K,m 2 ] < 2- JV « ir ».*^'0")-^ fl «]- e ), Vs . (54) 

• From the last step we deduce the existence of a (N, \2 NRl ], \2 NRl ~\) code s.t. 

P e (s ) < V, Vs . (55) 
First step: for any pair (R\, R2), we can rewrite eq. ( l40b for i=l,2,3 as 

E[Pe l (so)\m 1 ,m 2 ] < 2 -^, i( P,^)-^-™). (56) 
By using j49[ , which states that Fjy t i(p, Qn) > F n ^(p,Q n ), we get 

E[P ei ( S o)|mx,m 2 ] < 2 - fr V»-*<<>,Q»)-P*-^). (57 ) 



Note that F n ^{p,Q n ) and therefore F n ,i(p,Q n ) — pR is continuous in p £ [0,1], so there exists a maximizing 
p. Let us show that if R\ < C n 1 (Q n ) — log l g l , then maxo< p <i[P ni i(/3, Q n ) — pi?i] > (the cases i = 2,3 are 
identical to i = 1). Let us define 8 = C_ n 1 — Ri . From Lemma [181 we have that E n ,i(p,QN, so) is zero when 
p = 0, is a continuous function of p, and its derivative at zero with respect to p is equal or greater to C_ n t , which 
satisfies C_ n 1 > Ri + -f |, Thus, for each state sq there is a range p > such that 

1 I tS I 

E ntl (p,Q N ,s )-p(R 1 + -^-t) >0. (58) 



Moreover, because the number of states is finite, there exists a p* > for which the inequality ( I58l l is true for all 
So. Thus, from the definition of F n> i(p*, Q n ) given in ( flTT i and from d58l . 

F n ,i( / 9*,Q„) = -p*^^+min J B n ,i(p* ) Q n) so)>p*Pi, Vs . (59) 
n s 

Second step: We choose a positive number e such that e < mini e {x,2,3} Fn,i(p* ,Qn) — P* Ri- It follows from 
([57} that for every N that satisfies N > 

E[P ei (s )|mi,m 2 ] < 2~ n ^p'^-p' r ^\ (60) 

and according to the first step of the proof the exponent F n ,i(p* , Q n , s o) — P* Ri — e is strictly positive. 
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Third step: According to the previous step, for all grenr; > there exists an N such that E[P e j(so)|mi, 7712] < 
3Tgjjrri f°i" all i G 1,2,3 all s G S and all messages. Since P e (s ) — J2i=i Pei(so), then E[P e (so)|mi, ma] < 
l^jq-j-; furthermore E[P e (s )] < jsj+i f° r a U s o G S. By using the Markov inequality, we have 



Pr(P e ( So ) > V) < T^J, (61) 



and by using the union bound we have 



Pr(P e (s ) > v, for some s e5)<^ Pr(P e (s ) > 1?) = < 1 

so£S 



(62) 



Because the probability over the ensemble of codes of having a code with error probability (averaged over all 
messages) that is less than r\ for all initial states is positive, there must exist at least one code that has an error 
probability (averaged over all messages) that is less than r\ for all initial states. ■ 



VII. Proof of the Outer Bound (Theorem© 



In this section we prove Theorem [6] which states that for any FS-MAC there exists a distribution 
Q(xi\\zi~ 1 )Q(x2 H^ -1 ) sucn mat me following inequalities hold: 



Pi < -I(X?^Y n \\X2)+e n 
n 

Pi < -I(X2 ^Y n \\X?)+e n 
n 

P1+P2 < -I((X 1 ,X 2 ) n -^Y n )+e ni (63) 
n 



where e„ goes to zero as n goes to infinity. 

Proof of Theorem [5} Let W\ and W2 be two independent messages, chosen independently and according to a 
uniform distribution Pr(W/ = w{] — 2~ nRl , I = 1,2. The input to the channel from encoder I at time i is xu, and 
is a function of the message Wi and the arbitrary deterministic feedback output z\~ (y 1-1 ). 

The following sequence of equalities and inequalities proves that if a code that achieves rate Pi exists then the 
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first inequality holds, i.e., R 1 < ±I{X? -> Y n \\X%) + e„: 

H(Wt) 
H{W X \W 2 ) 

I{W X ; Y n \W 2 ) + H(Wi\Y n , W 2 ) 
I{Y n ;W l \W 2 ) + l + P i e n) nR 
H(Y n \W 2 ) - H(Y n \W u W 2 ) + 1 + P^nR 

n n 

Y^HiYlY*- 1 ^) -J2 H (Y\W 1 ,W 2 ,Y 1 - 1 ) + 1 + pWnR 
i=l i=l 
n n 

H(Y t \Y*-\W 2l X l 2 ) - HpllWiW, Y^\X\,X^ + 1 + P^nR 

i=l i=l 
n n 

Y,H(Y t \Y l -\X l 2 ) -J2 H (Yi\ Yi ~\ x l x i) + l + Pi n) nR 
»=i »=i 
n 

J2 I(Xi\X[\Y^,Xi) + 1 + P^nR 
»=i 

I{X% -» F n | + 1 + Pj n >niJ, (64) 
where, 

(a) and (b) follow from the fact that the messages Wi and W 2 are independent and chosen according to a uniform 
distribution, 

(c) follows from Fano's inequality, 

(d) follows from the chain rule, 

(e) follows from the fact that xu is a deterministic function given the message W\ and the feedback z l {~ , where 
the feedback z 1 ^ 1 is a deterministic function of the output 

(f) follows from the fact that the random variables W\,W 2 ,X\,X\,Y l form the Markov chain (Wi,Wa) — 
(XiXiYi-^-Y. 

Dividing ( f64b by n, we conclude that if there exists a code for which the error probability of decoding the 
messages W\,W% is Pe then the distribution Q(xi\\z™~ 1 )Q(x 2 l \\z2~ 1 ) induced by the code satisfies the first 
inequality of the outer bound theorem where e n = — + Pe R, The proofs of the other two inequalities in (|63l l 
follow by a completely analogous sequence of steps as in d64b : The proof of the second inequality of the outer 
bound starts with the equalities R 2 = H(W 2 ) = H(W 2 \Wi) and the third with i?i + R 2 = H(Wi,W 2 ). ■ 

Corollary 20: The outer bound given in Theorem [6] implies that liminf !Z n is an outer bound for the achievable 
region. 

Proof: Recall the definition of lZ n in eq. ( f2Qb . Let (Ri,R 2 ) be an achievable rate pair. We will create a 
sequence of rate pairs (i?i, n , R2,n) & T^n mat converges to (Ri,R 2 ) and therefore, by the definition of liminf of 
a sequence of sets (given in Appendix ITvTi. (Ri,R 2 ) e liminf TZn. 



nRi 



(a) 



(b) 



(c) 
< 



('-') 



(e) 



(/) 
< 



< 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, AUG. 2007. 



21 



If (Rx,B,2) € lZ n then we choose (Ri. n , R2,n) = (Ri,R2)- Otherwise we choose the closest point in lZ n to 
i?i,i?2- Because of inequality d63l > the distance | |(i?i jra , i?2,n) — , i?2 ) 1 1 < 2e n and, therefore, the sequence 
(Ri, n ,R2,n) converges to iZ 2 ). ■ 



VIII. Capacity Region of the FS-MAC without Feedback 

The inner and outer bounds given in Theorems[5]and[6]specialize to the case where there is no feedback, i.e., z\, Z2 
are null. Hence, we can use it in order to extend Gallager's results [17, Ch. 4] on the capacity of indecomposable 
FSCs to indecomposable FS-MACs. An indecomposable FS-MAC (FSC) is a FS-MAC (FSC) for which the effect 
of the initial state vanishes with time. More precisely: 

Definition 3: A FS-MAC (FSC) is indecomposable if, for every e > 0, there exists an uq such that for n > no, 
\P(s n \x^,x^,s ) - P(s n \x^,x%,s' )\ < e for all s n ,x^,x^ , s and s' . 

Since there is no feedback, according to Lemma 0] directed information becomes mutual information and causal 
conditioning becomes regular conditioning in all the expressions in the inner bound (Theorem [5]) and outer bound 
(Theorem [6j. 

The proof of the capacity region of FS-MAC is based on the following two lemmas. The first lemma is used for 
showing that the difference between the lower bound and the upper bound goes to zero as n — > oo and the second 
lemma, which is proved in Appendix [V] is used for showing that the limits exist. 

Lemma 21: Let )Q(x% )}«>i be an arbitrary sequence of input distribution. If the channel is an 

indecomposable FS-MAC then the following holds for all Sq,s ': 

Km ±\I(X?;Y n \X?,s' )-I(X?;Y n \X2, So ')\ = 

n^oo Jl 

lim ±\I(X2;Y n \X?,s' )-I(X2;Y n \X?,4)\ = 

n — >oo ji 

lim -\I(X^X^,Y n \s' )-I(X^,X^Y n \4)\ = 0. (65) 

n — >oo jl 

Proof: The proof is identical to the proof of Theorem 4.6.4 in [17]. ■ 
The following lemma, which is proved in Appendix [V] establishes the sup-additivity of {R n }- 
Lemma 22: (sup-additivity of ' TZ n . ) For any FS-MAC, the sequence {]Z n }, which is defined in ( fl9l ), is sup- 
additive, i.e., 

(n + l)n n+l 2 nR n + ITZf , (66) 

and therefore liirin^oo TZ n exists. Moreover, for an indecomposable FS-MAC without feedback linin^oo ]Z n = 
lim^^oo 7Z n where 7Z n is defined ( l20l ). 

Proof of Theorem 1 Theorem [5] implies that lim n ^ 00 ]Z n is achievable, and Corollary [20] implies that 
liminf n _ >00 lZ n is an outer bound. Finally, since according to Lemma l22l the two limits are equal to linin^oo lZ n , 
the capacity region is given by the last limit. ■ 
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IX. Sufficient Conditions for the Inner and Outer Bounds to Coincide for General Feedback 



A. Stationary Finite state Markovian MAC with feedback 



A stationary finite state Markovian MAC satisfies 



P{Vi, Si\xu, %2i, Si-i) = P(si\si-i)P(yi\si-i,xu, x 2 i), 



(67) 



where the initial state distribution is the stationary distribution P(sq). In words, the states are not affected by the 
channel inputs. 

For the stationary Markovian-MAC, the sequence {TZ n } is sup-additive. It follows from the fact that if we 
concatenate two input distributions Q n+k = Q n Q k , then I(X? +k -> Y n+k \\X' 2 l+k ) = I(X[ l -> Y n \\X%) + 
I ( x iX+i ~^ Y n+i\\ X 2$,+i)> hence {n + k)K n+k 2 nR n + kTZ k . According to Lemma l23l the limit exists and is 



Next, we prove Theorem [8] that states that for a Markovian FS-MAC with a stationary ergodic state process, the 
inner bound (Theorem |5]l and the outer bound (Theorem |6]l coincide and therefore the capacity region is given by 



Proof of Theorem [3} Recall that the inner bound is given in Theorem as TZ N and the outer bound given in 
Theorem [6] and in Corollary l20l as lim inf TZn ■ Next we show that the distance between TZ N and TZn goes to zero 
which implies by Lemma [25] that both limits equal and therefore the capacity region can be written as limlZjsr. 

Let us consider a specific input distribution denoted by Q{x^ \\z N " 1 )Q(x2 ||z ) corresponding to the region 
of the outer bound TZn- Let us now consider an input distribution Q for n + N inputs corresponding to the inner 
bound TZ N , such that it is arbitrary for the first n inputs and then it is Q{x^ \\z N ~ 1 )Q(x 2 H-z^ -1 ). 

Now let us show that the term of the inner bound, i.e. Iq(X^ — > Y N \ |X^ V+ ™, so) and the term of the outer 



equal to 




(68) 



lim n — TZ-ri- 
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bound Iq(Xi — * Y N \\X^) are arbitrarily close to each other. 

lQ(XF +n ^Y N+n \\X» +n ,s ) 

> lQ_(X? +n -> Y N+n \\X» +n , Sn, so) - log \S\ 

(b) N+n 



> HQ{Y i \Y i -\XlS n ,s )-HQ(Y i \Y i -\XlXi,S n ,s )-log\S\ 



(<0 



i=n-\-l 
N+n 



> H Q_( Y i\ Y n+l X in+l,S n , S ) ~ HQWY^lX^Xl^Sn, 8 ) ~ log \S\ 





i— n+1 












■n 
+ 1 


. vlV+ll| 
— * 1 n+X 1 


\X; 


(d) 
> 




-n 
f 1 


~~ * I n+l \ 


\X, 


> 




-71 
+ 1 


. v N+n\ 
~ * I n+1 \ 


\x : 


(e) 
> 


fQiXl.n- 


-71 
f 1 


. v N+n\ 
~ * -'ti+1 1 


\X, 


(/) 
> 






Y N \\X 2 N ) 


— < 



N+n 
2, n+1 



AH 



i,£ n )-*(i\r + n)log|y|-log|S| 



2,n+ 



(69) 
where 

(a) follows from Lemma |2] that states that conditioning on S n can differ at most by log \S\, 

(b) follows from omitting the first n elements in the sum that defines directed information, 

(c) follows from the fact that conditioning decreases entropy, 

(d) follows from the fact that the Markov chain is ergodic, hence for any d > 0, there exists an n such that 
\P(s n \so) ~ P(s n )\ < 8 for any sq E S and s n E S, where P(s n ) is the stationary distribution of s n , 

(e) follows from Lemma [2] that states that conditioning on S n can differ by at most log |<S|, 

(f) follows from the stationarity of the channel. 
Dividing both sides by N + n we get that for any s , 

1 T rvN+n v N+n\\ v N+n „ \ \ T i V N v N\\vN\ ^ , n \i__i-ni n^Sl^l 



_ n -/Q(ir + " - Y^ n \\X^,s ) - ^— -7^(Xf - Y»\\X») > + ^)log|y| - (70) 

Inequality (T70l > shows that the difference between the upper bound region and the lower bound is arbitrarily small 
for N large enough and, hence, in the limit the regions coincide. ■ 

B. Finite State Markovian MAC with limited ISI 

In this subsection we consider a MAC inspired by Kim's point-to-point channel [22]. The conditional probability 
of the MAC is given by 

P{yi,Zi\x\,x\,Zi-i) = P(z i \zi-i)P{y i \z i -\,x\ >i _ m ,x\ i _ m ), i = 1,2,3, ... (71) 

where the distribution of Zq is the stationary distribution P(zq), and there is also some initial distribution 

P(x_ m+ i, ...,Xq). 
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This channel is a FS-MAC where the state at time i is x % {~^_ , a^l-m) an( ^ therefore the inner bound 

(Theorem|5]l and the outer bound (Theorem[6]) apply to this channel. Theorem [8] also holds for this kind of channels, 
namely, the capacity region is given by linin^oo TZ n . The proof is very similar, the only difference being that the 
input Q for n + N inputs is constructed slightly differently: it is arbitrary for the first n — m inputs, then it is as 
the initial distribution P(x— m +i> ■■■,%o)> an d then it is Q( x i^\\ z ^Qi^W' 2 ' )■ 

It is also possible to represent the channel with an alternative law, identical to the law of the channel given in eq. 
dTTl i for i > to + 1 but for i < m the output t/j is not influenced by the input and is, with probability 1, a particular 
output (f> e y. Let us define TZ% similarly as 7Z n but with the alternative law for the channel. On one hand, it is 
clear that 1Z% C lZ n for all n, and on the other hand the difference between TZ^ and TZ n is at most m log y because 
it is possible to use the distribution of the first m inputs, Q(x™), to create a desired initial distribution and then 
use the same input as in TZ n . Hence, 

lim Tlf. = lim Tl n . (72) 

n — *oo n — >oo 

The advantage of analyzing TZf t rather than analyzing lZ n is that the sequence nR% is sup-additive, i.e. {n+l)1Z^ L+l D 
nJZ^ + ITlf, and according to Lemma l23l lmi n ^ oc lZ'^ = cl ^Un>i-^n)- Hence, we can conclude that Theorem 
[9] holds for this channel too, namely, if the capacity of the Finite state Markovian MAC with limited ISI is zero 
without feedback then it is zero also in the presence of feedback. 

X. Conclusions and Future Directions 

In this paper we have shown that directed information and causal conditioning emerge naturally in characterizing 
the capacity region of FS-MACs in the presence of a time-invariant feedback. The capacity region is given as a 
'multi-letter' expression and it is a first step toward deriving useful concepts in communication. For instance, we 
use this characterization in order to show that for a stationary and ergodic Markovian channel, the capacity is zero if 
and only if the capacity with feedback is zero. Further, we identify FS-MACs for which feedback does not enlarge 
the capacity region and for which source-channel separation holds. 

For the point-to-point channel with feedback, recent work has shown that, for some families of channels such as 
unifilar channels [28] or the additive Gaussian where the noise is ARMA [22], the directed information formula can 
be computed and, further, can lead to the development of capacity achieving coding schemes. One future direction 
is to use the characterizations developed in this paper to explicitly compute the capacity regions of classes of MACs 
with memory and feedback (other than the multiplexer followed by a point-to-point channel), and to find optimal 
coding schemes. 
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Appendix I 
Proof of Lemma[3] 



Recall that Lemma [3] states that if 



Q(x? ) xZ\\y N - 1 )=Q(xT\\y"-^Q(x%\\y"- 



N\\„.N-l\ / ~,'N\\„.N-l\ 



then 



i(Q(x? ,4 ||y ); P(y N I K , 4 V )) = W - 1 l* 2 A ) 



(73) 



(74) 



Proof: The following sequence of equalities proves the lemma. 



l(Q(x?,x»\\y N -i);P(y N \\x?,x»)) 

(o) 
(6) 



E^Q(*'fll» w - 1 )W|s'M 



(c) 



E ^(«^ 



E 
E 
E 

E 
E 



p(v N M,x?) 



Z x ,?Q&?\\v lf -\x?)P(v»\\x'* r > x?) 

Q(x^\\y N - 1 )P(y N \\x^, X ^) 

Q^II^-^Ex'f Q(x'?\\y N -\x»)P(y"\\x>?,x») 

Q(x^\\y N ' 1 )P(y N \\xf,x^) 



P(af,y w ) 



P(y»\\xP 



(d) 



N 



Y N \\X$) 



(75) 



(a) follows from the assumption given in eq. (1731 1, 

(b) follows from the definition of the functional T(Q; P) given in eq. (Q~6}. 

(c) follows from Lemma Q] that states that P(x?,x£,y N ) = Q(x? , x%\ \y N ~ 1 )P{y N \ \xf , x%) and the 
assumption given in (1731 . 

(d) follows from the definition of directed information. 
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Lemma |4] states that if 



Appendix II 
Proof of Lemma|4] 



Q(x?,x?\\y* r - 1 ) = Q(x$!)Q(x2) 



(76) 



then 



I{X?;Y N \X%) = I(X? -» r JV ||X^ v ). 



(77) 



Proof: The following sequence of equalities proves the lemma. 



(a) 



(b) 



E 



E 



E 



E 



E 



log- 



P(Y N ,X?\Xp 



P(Y N \X?)Q(X»\X») 



log- 



P(Y N ,X?,X. 



log 



log 



P{Y",X?)Q{X?\X?) 
Q (X? , X$ 1 1 Y N - 1 ) P {Y N 1 1 Xf , X? ) 



p(y^||xf)Q(xf||y^-i)Q(xf|xf 

Q(X?)Q(X?)P(Y N \\X?,X?y 
P(Y«\\X?)Q(X?)Q{X?) _ 



log P(F^||Xf,X 2 ^- 



7(Xf^Y JV |p^ v ) 



AT I 



(78) 



(a) follows from multiplying the numerator and denominator by P(a;^ r ). 

(b) follows from decomposing the joint distributions P(y N , x^ , x%) and P(Y N , X^) into causal conditioning 
distribution by using Lemma Q] 

(c) follows from the fact that the assumption of the lemma given in d76l l implies that Q{X^ , X§ ) = 
Q{Xi)Q{Xi). This can be obtained by multiplying both sides of d76l ) by P(y n \\xi ,x%) and then summing 
over all y n e y n . 



Appendix III 
Proof of LemmaO 



Lemma [121 states that 



max I(X?,X$ Y n ) = max IfXV'.Xo — > V n ) = 0, (79) 

QWIIy'-^QWIly— 1 ) Q(x»)Q(x«) 

and each condition also implies that P(y™| X2 ) = P(y n ) for all ie™,^- 

Proof: Proving the direction ==>• is trivial since 

max Jpf^Jff -vF") > max I(X]\ X% —yY n ). (80) 

Q(x?||y"-i)Q(xJ||y"-i) Q(xf)Q(a:J) 
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For the other direction, •<=, we have the assumption that /(X™,^ 1 — > Y n ) — for all input distributions 
Q( x i )Q( X 2 )> anc l m particular for the case that Xf and XJ are uniformly distributed over their alphabets. Directed 
information can be written as a Kullback Leibler divergence, i.e., 

Q(x?)Q{x?)P(y n \\x?,x2 



Q(x^Q(x^P(y n \\x^x^log- 



P(y n )Q(xVQ(x%) 







(81) 



and by using the fact that if the Kullback Leibler divergence D(P\\Q) = ^2 xeX P(x) log ^|^y is zero, then 
P(x) = Q(x) for all x E X, we conclude that (HB implies that P(y n \\x^,x^) = P(y n ) for all x\ € X{ 1 and all 
a# e X?. It follows that 

p(Y n \\x?,x$y 



max I(X?,X^Y T 



max E log ■ 

Q(^\\v n - 1 )Q(^\\v n - 1 ) I 

max EfOl = 0. 

Q(x?||s,"-i)Q(xJ||j/— i) 



P(yn) 



(82) 



Appendix IV 

SUP-ADDITIVITY AND CONVERGENCE OF 2D REGIONS 

Let A, B be sets in R 2 , i.e., A and B are sets of 2D vectors. The sum of two regions is denoted as A + B and 
defined as 

A + B = {a + b : aei,be B}, (83) 
and multiplication of a set A with a scalar c is defined as 

cA = {ca : a e A}. (84) 
A sequence {^4„}, n = 1,2, 3, of 2D regions is said to converge to a region A, written A = lim ^4„ if 

lim sup A n — lim inf A n = A (85) 

where 

lim inf A n = {a : a = lim a„ , a n e A„ } . 

lim sup A n = {a : a = lim a fe , a fe e A nk } , (86) 

and rifc denotes an arbitrary increasing subsequence of the integers. An alternative and equivalent definition of 
lim sup and lim inf is given by lim sup A n = f|„>i cl (\J m > n Am) and lim inf A n = (J n >i cl (flm>n For 
more details on convergence of sets in finite dimensions see [48]. 
Let A denote 

A = cl ( |J A n \ . (87) 
We say that a sequence {-An}n>i is bounded if sup{||a|| : a. £ A} < oo where || ■ || denotes a norm in M. . 
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Lemma 23: Let A n , n = 1,2, be a bounded sequence of sets in M 2 that includes the origin, i.e. (0,0). If 
nA n is sup-additive, i.e., for all n > 1 and all N > n 

NA N D nA n + (N- n)A N _ n (88) 

then 



lim A n = A. (89) 

n — >oo 

Proof: From the definitions we have A D lim sup A n D liminf A n . Hence it is enough to show that A C 
lim inf A n . 

Let a be a point in A. Then for every e > there exists an n and a point a e such that a £ G A„ and ||a — a e | | < e. 
By induction we prove that for any integer to > 2, A n C A„ m , and this implies that a e G A mn . For m = 2 we 
choose N = 2n and we get that 

42y + y 34 (90) 

Now assume that it holds for m — 1 and let us show that it holds for to. 

A m „ 3 + (W = 1)i4(m - 1)B ^ ^ + (? " " 1M " ^ A n . (91) 
to to mm 

Now, for any iV > n, we can represent AT as 77in + j where < j < n — 1, hence 

3 ; — :-<4? H ; — :A rnn . (92) 

TOn + j TOrt + ] 

Because a e is in A n , then it implies that it is in A mn too. Following d92| l and the fact that (0, 0) G Aj we obtain 

777.77. 

T a e G A mn+j . (93) 



TO77, + J 

For any S > and for any iV > j we conclude the existence of an element in An for which the distance from a 
can be upper-bounded by 

3 



mn 

a c — a 

mn + j 



a e -a — ra £ < ||a e - a|| + <5||a e || < e + 5\\a e \\. (94) 

TO71 + J 

Because e and 6 are arbitrarily small we can find a sequence of points a„ G A n that converges to a and therefore 
a G liminf A n , which implies that A C liminf A n . ■ 

Corollary 24: For a sup-additive sequence, as defined in Lemma [23] the limit is convex. 
This corollary follows immediately from the definition of the sup-additivity property, eq. d88l where n = aN, 
where < a < 1, and N goes to infinity. 

The (Hausdroff) distance between two sets A and B, is defined as 

d(A, B) = max{sup[d(a, B : a G A], sup[d(b,A) : b G B]}, (95) 
where the distance between a set A and a point b is given by, 

d(b,A) =inf[||a-b|| :ae A] (96) 
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Lemma 25: If liirin^oo d(A n , B n ) = then 

lim sup A n = lim sup B n , 

liminfyln = liminf i?„. (97) 
Proof: The proof is straightforward. Given a sequence {afc} S A nk that converges to a, we construct a 
sequence {bk} by finding a point in B nk that is at a distance less than i +d(afc. -B nfc ). Since the distance between 
the sets goes to zero, limbfe = limafc = a and from the definitions of limits of sets, it implies that d97b holds. ■ 

Appendix V 
Proof of Lemma|221 

Recall the definition of TZ n and lZ n in ( fT9l and ( f20l > respectively. 
Lemma l22l states that 

(n + OSh+j^nSn + iSi- (98) 



and for an indecomposable FS-MAC without feedback linin^oo 7£ n = lim n _ s . 00 7?.„. 

Proof of Lemma \22\ We notice that if a sequence of sets is sup-additive then the sequence of the convex hull 
of the sets is also sup-additive. Hence, it is enough to prove the sup-additivity of the sequence TZ n without the 
appearance of the random variable W that its role is to convexify the regions. 

The set TZ n is defined by three expressions that involve directed information. Because each expression is sup- 
additive the whole set is sup-additive. We prove that the first expression, i.e. min So I(X{ 1 — > F™||ATJ, sq) — log |«S| 
is sup-additive (the proofs of the supper-additivity of the other expressions are similar and therefore omitted). 



n+l i. \rn+l | | yn+l 

n+l 



min I(X? +t -> Y n+i \\xq +l ,s ) 

so 



> mmV/(y i ;Xi|r- 1 ,^,,s )+min V I(Y f , X{ \Y^\ X*, s ) 

so z — 4 ~ s * — 4 

i—l j—n-\-l 



> I(X?^Y n \\X2,a )+ £ I{Y 3 -Xl n+1 \Yi-\Xls ) 

j=n+l 

> I(X?^Y»\\X2,8 )+ J2 nY 3 ;X{ n+1 \Yi-\Xl 7 S n , So )-\og\S\ 

j=n+l 

n+l 

= mm/(X 1 "^Y"||X 2 ", So )+min£P( Sn | S o) £ ^ Kn+i^ 1 M,n+n *0 - log \S\ 



n+l 

> min/(X 1 "^y»||X 2 ", So )+niin £ I(Y r , X(„ +1 |^ 1 1 , X{ n+1 , s n ) - log \S\ 

so s„ * — ' ' T T 

j=n+l 

C = 5 mmI(X^Y n \\X^s )+miiiI(X l 1 ^Y l \\X l 2> s )-log\S\. (99) 

so s 

(a) follows the definition of the directed information the fact that min s [/(s) + g(s)] > min s f(s) + min s <?(s), 
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(b) follows the fact that I(X; Y, Z) > I(X; Y), 

(c) follows Lemma |2] that states that conditioning by S n can differ by at most log |<S|, 

(d) follows from the stationarity of the channel. 

According to Lemma [23] since the sequence {TZ n } is sup-additive the limit exists. In the rest of the proof we 
show that lim„^ 00 2: n = lim„^oo7?.„. The terms of the region TZ n have an auxiliary random variable W whose 
only role is to convexify the region. Let us denote r R° n the same region as TZ n where W is restricted to be null. We 
show first that restricting W to being null does not influence the limit, i.e., lim n ^oo TZ n = liirin^oo 7£° . In the first 
half of the proof we showed that is sub-additive. Using this fact, we show now, that any convex combination 
with rational weights (1, ^p) of any two points from 7£° is in TZ^. 



T~t O T~1 O 



K° 



(100) 



The left and the right inclusions in ( 1 1001 ) are due to the sup-additivity of . The left inclusion is from the definition 
of the sup-additivity and the right is due to the fact that sup-additivity of also implies that for any two positive 
integers m, n, TZ^ nn D (This is shown by induction in d90l91| >). From ( 11 001 ) we can deduce that for any e > 
we can find a fc(e) such that TZ n C 7^° fe + e. This fact, together with the trivial fact that TZ n 2 TZ° V and the fact that 
the limits of both sequences exist, allow us to deduce that the limits are the same, i.e., limn-xx, TZ n — linin^oo 

We conclude the proof by showing that, for any input distribution Q(x™}Q{xVf), the difference between the terms 
in the inequalities of {7^°} and {lZ n } goes to zero asm oo, hence the distance between the sets of the sequences 
goes to zero as n — * oo and, by Lemma [25] the limits of the sequences are the same. 



lim — 

n—»oo 71 



7(Xr -> Y n \\X r 2 l ) - minJfXf -» Y n \\X$, s Q ) + log \S\ 

so 



(a) . 1 

< lim — 

n — *oo 71 

= lim — 

n — >oo 71 

(b) _ 1 

< lim — 

n— >oo Ti 



I(X? - Y n \\X2,S ) -min/(XJ* -> Y n \\X^, s ) + log |5| 



-> Y n \\X2,S ) -minI(X? -» F"||X 2 V )) 



max J(A? -» F n ||X™, s ) - min/pCJ 1 -> F"||X£, s )) 

so so 







log|5| 



(101) 



(a) follows from Lemma [2] and the triangle inequality. 

(b) follows from the fact that max So I(X? -> Y n \\X%, s ) > I(X{ 1 -> F"||X£, 5 ). 

(c) follows from Lemma |2TI that states this equality for indecomposable FS-MAC without feedback (recall also 
that directed information equals mutual information in the absence of feedback). 
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Appendix VI 
Proof of Theorem[T61 



E [^i] = EE P(x? lX » ,y N )P[errorl\m lim2 ,x? ,x»,y N ] 

V N x?,x% 

= E E Q{^\\z^- 1 )Q{x^\\z N - 1 )P{y N \\x^ ,x^)P[errarl\m u m2,x N ,y% (102) 

y N x N ,x$ 

where P[errorl\mi,m2,x N ,y N ] is the error probability of decoding mi given that 7712 is decoded correctly. 
Throughout the remainder of the proof we fix the message 7711,771.2. For a given tuple (mi,m2,x^ ,2/ ) 
define the event A m > , for each m^ ^ mi, as the event that the message rn'i is selected in such a way that 
P(y N \m' 1 , 7712 ) > P(y N \m, ITI2) which is the same as P(y N \\x'^ , x 2 ) > P(y N \\x^ , x 2 ) where x 1 \ is a shorthand 
notation for x± (m' 1; z N ^ 1 (y N ^ 1 )) and xf is a shorthand notation for xf (mi, z^~ 1 (y N ^ 1 )) for I = 1,2. From 
the definition of we have 



P(A m[ \ mi ,m 2 ,x^x^y N ) = ^Q(4 

2;'™ 

< E^fii^- 1 ) 



i[p(^||xT,^)>p(^||< ) 0] 



P(^||<,0 



any s > 



(103) 



where l(x) denotes the indicator function. 

P[errorl\m 1 ,m2,Xi ,x% ,y N ] = P( |J A m / |mi,m 2 ,a:f ,X2,y N ) 

< mini ^ P{ A ni' 1 \m 1 ,m 2l Xi ,x 2 ,y N ),l 

- P 

E p (An;|rai,ra2,a;f ,0^,2/^) 



< 



any < p < 1 



< 



(Mi-l)^Q(^ 



P(y N \\x'?,x») 

p(^|K,^) 



< p < l,s > 0, 
(104) 



where the last inequality is due to inequality (11031 ). By substituting inequality ( 11041 in eq. dl021 > we obtain: 



E[p el ]<(M-iy Q( 3 



N\\N-1\ 



s/i 



Y,Q(x>»\\z N -i)P(y N \\x>?,x?) 



By substituting s = 1/(1 + p), and recognizing that x' is a dummy variable of summation, we obtain eq. ( 137b and 
complete the proof of the bound on E[P e i]. 

The proof for bounding E[P e 2] is identical to the proof that is given here for E[P e i], up to exchanging the 



N „N 



indices. For E[P e 3] the upper bound is identical to the case of the point-to-point channel with an input Xi 
proven in [27] where the union bound which appears here in eq. ( 11041 ) consists of (Mi — 1)(M2 — 1) terms. 



Xn , as 
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